Oran System: a Basis for an Arabic Ocr

نویسنده

  • Abdelmalek Zidouri
چکیده

In this paper we present a system for document understanding and for recognition of printed Arabic text. Arabic characters must be segmented before recognition. We overcome the problem of segmentation by our proposed ORAN system (Offline Recognition of Arabic characters and Numerals). ORAN is based on a method called Modified MCR. Using a stroke index, we can parse compound document images into three categories: text, picture and graphical patterns. The eventual Arabic text block is considered for further analysis and recognition. Its Modified MCR expression is obtained. The recognition is achieved by simple matching of candidate characters to reference prototypes. The prototypes are designed according to some topological features of the strokes obtained with respect to baseline detection and a zoning scheme. The recognition rate obtained for a popular Arabic font is higher than 97%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Retrieval–travel-time model for free-fall-flow-rack automated storage and retrieval system

Automated storage and retrieval systems (AS/RSs) are material handling systems that are frequently used in manufacturing and distribution centers. The modelling of the retrieval–travel time of an AS/RS (expected product delivery time) is practically important, because it allows us to evaluate and improve the system throughput. The free-fall-flow-rack AS/RS has emerged as a new technology for dr...

متن کامل

A Real-time DSP-Based Optical Character Recognition System for Isolated Arabic characters using the TI TMS320C6416T

Optical Character Recognition (OCR) is an area of research that has attracted the interest of researchers for the past forty years. Although the subject has been the center topic for many researchers for years, it remains one of the most challenging and exciting areas in pattern recognition. Since Arabic is one of the most widely used languages in the world, the demand for a robust OCR for this...

متن کامل

Isolated Persian/Arabic handwriting characters: Derivative projection profile features, implemented on GPUs

For many years, researchers have studied high accuracy methods for recognizing the handwriting and achieved many significant improvements. However, an issue that has rarely been studied is the speed of these methods. Considering the computer hardware limitations, it is necessary for these methods to run in high speed. One of the methods to increase the processing speed is to use the computer pa...

متن کامل

A Database for Arabic Printed Character Recognition

Electronic Document Management (EDM) technology is being widely adopted as it makes for the efficient routing and retrieval of documents. Optical Character Recognition (OCR) is an important front end for such technology. Excellent OCR now exists for Latin based languages, but there are few systems that read Arabic, which limits the penetration of EDM into Arabicspeaking countries. In developing...

متن کامل

High capacity steganography tool for Arabic text using 'Kashida'

Steganography is the ability to hide secret information in a cover-media such as sound, pictures and text. A new approach is proposed to hide a secret into Arabic text cover media using "Kashida", an Arabic extension character. The proposed approach is an attempt to maximize the use of "Kashida" to hide more information in Arabic text cover-media. To approach this, some algorithms have been des...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004